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Abstract 

The non-Gaussian quasi maximum likelihood estimator is frequently used 
in GARCH models with intension to improve the efficiency of the GARCH 
parameters. However, the method is usually inconsistent unless the quasi- 
likelihood happens to be the true one. We identify an unknown scale param- 
eter that is critical to the consistent estimation of non-Gaussian QMLE. As a 
part of estimating this unknown parameter, a two-step non-Gaussian QMLE 
(2SNG-QMLE) is proposed for estimation the GARCH parameters. Without 
assumptions on symmetry and unimodality of the distributions of innovations, 
we show that the non-Gaussian QMLE remains consistent and asymptotically 
normal, under a general framework of non-Gaussian QMLE. Moreover, it has 
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higher efhciency than the Gaussian QMLE, particularly when the innovation 
error has heavy tails. Two extensions are proposed to further improve the 
efficiency of 2SNG-QMLE. The impact of relative heaviness of tails of the 
innovation and quasi-likelihood distributions on the asymptotic efficiency has 
been thoroughly investigated. Monte Carlo simulations and an empirical study 
confirm the advantages of the proposed approach. 



1 Introduction 



Volatility has been a crucial ingredient in modeling financial time series and design- 
ing risk management and trading strategies. It is often observed that volatilities tend 



to cluster together. This characteristic o: 
auto correlated and changing over time. 



" finan c ial da ta suggests that volatilities are 



Engld (119821 ) proposed ARCH (autoregres- 



sive conditional heteroscedasticity) to model volatility dynamics by taking weighted 
averages of past squared forecast errors. This seminal idea led to a great richness 
and variety of volat ility mode l s. Am ong numerous generalizations and developments, 



GARCH model by 



BoUerslevI ( 119861 ) has been commonly used: 



Xt = VtSt (1) 

p q 

v^ = c+j2 ^i^U + ^^-^t-j- (2) 

i=l j=l 

In this GARCH(p, q) model, variance forecast takes weighted average of not only 
past square errors but also historical variances. The simplicity and intuitive appeal 
make GARCH model, especially GARCH(1, 1), a workhorse and good start point in 
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many financial applications. 

Earlier literature on inference from ARCH/GARCH models is based on the max- 
imum likelihood estimation (MLE) with conditional Gaussian assumption on the in- 
novations. Plenty of empirical evidence, however, has documented heavy-tailed and 
asymmetric innovation distr ibutions of et, rendering this assumption unjustified, see 



for instance 



Dieboldl (119881 ). Consequently, MLE using Stud ent's t or generalized 



Gauss i an likelihood fun ct ions h a s bee n intr oduced, see e .g. 



(1198fih 



BoUerslevI (119871 ) 



Engle and BoUerslev 



NelsonI (jl99ll ). However, these meth- 



Hsiehl (11989! ). and 

ods may lead to inconsistent estimates if the distribution of the innovation is mis- 
specified. Alternatively, the Gaussian MLE, regar ded as a quasi max ii num likeli- 



hood estimator (QMLE) may be consistent, see e.g. 



Elie and Jeantheaul (119951 ). and 



asymptotically normal, provided th at the innovat i on ha s a fi nite fourth moment 



Hall and Yao 



even if it is far from Gaussian, see 
The asymptotic theory dat es back to as ear^ 



Lee and Hansen 



1994) and 



y as 



and 


Berkes et al. 


(1986 


) for ARCH 



(120031). 



Lumsdaind (19961) f or GARCH(1, 1) with stronger con- 



BoUevslev and Wooldbridgd (119921 ) for GARCH(p, q) under high level 



ditions, and 
assumptions. 

Nevertheless, gain in robustness comes with efficiency loss. Theoretically, the 
divergence of Gaussian likelihood from the true innovation density may considerably 
increase the variance of the estimates, which thereby fails to reach the Cramer- 
Rao bound b y a wide margin, reflecting the cos t of not knowing the true innovation 



distribution. 



Engle and Gonzalez- River al (jl99ll ) has suggested a semiparametric pro- 



cedure that can improve the efficiency of the parameter estimates up to 50% over 
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the QMLE based on their Monte Carlo simulat ions, but still 



the to tal potential gain in efficiency, see also 



LintonI (119931 ). 



in capable of capturing 



Drost and Klaassen 



(119971 ) has put forward an adaptive two-step semiparametric procedure based on a 



re-parametrization of the GARC Hfl, 1) model with unknown but symmetric error. 



Gonzlez- Rivera and Drost 



(119991 ) has compared its efficiency gain/loss over Gaussian 



QMLE and MLE. All the effort wou 



finite fourth moment. 



Hall and Yao 



d become void if the innovation fails to have a 



20031 ) has considered the Gaussian QMLE and 



shown that it would converge to stable distributions asymptotically rather than a 
normal distribution. 

The empirical reason of Gaussian QMLE's efficiency loss is that financial data are 



generally hea vy tail distributed. The cond i tiona. 



For example, 



normality assumption is violated. 



Bollevslev and Wooldbridgd ( 11992! ) reported that sample kurtosis of 



estimated residuals of Gaussian QMLE on S&P500 monthly data is 4.6, well exceed- 
ing the Gaussian kurtosis which is 3. It is therefore intuitively appealing to develop 
QMLE based on non-Gaussian likelihoods, especially heavy tailed likelihoods. And 
the efficiency loss of Gaussian QMLE can be greatly reduced by replacing the likeli- 
hoods with heavy tailed ones. 

In contrast with the majority of literature focusing on Gaussian QMLE for infer- 
ence, there is rather limited attention on inference using non-Gaussian QMLE. This 
may be partly due to the fact that the Gaussian QMLE is robust against misspec- 
ification of error distribution, while directly using non-Gaussian QMLE is not. In 
general a non-Gaussian QMLE does not yield consistent estimation when true error 
distribution deviates from the likelihood. Moreover, this inconsistency could not be 
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corrected even as we allow to estimate a shape parameter indexing the non-Gaussian 
likelihood family together with model parameters unless the true innovation den- 
sity is a member of this likelihood family. Otherwise, estimating shape along with 
model parame ters simply picks one li k elihoo d that is "least" biased, however the 



bias persists. 



Newey and Steigerwaldl (119971 ) have considered the identification of 



the non-Gaussian QMLE for heteroscedastic parameters in general conditional het- 
eroscedastic models. They have also pointed out that the scale parameter may not 
be identified as its true value since it is no longer a natural scale parameter for 
non-Gaussian densities. 

A valid remedy served for non-Gaussian QMLE would be manipulating model 
assumptions in order to maintain consistent estimation. For example, the true in- 
novation density is so metimes taken to be S tuden t's t or generalized Gaussian for 



granted. Alternatively, iBerkes and HorvathI ( l2004l ) has shown that with a different 
moment condition on the true innovations instead of the original E{e'^) = 1, a corre- 
sponding non-Gaussian QMLE would obtain consistency and asymptotic normality. 
However, this moment condition E^e^) = 1 is an essential assumption which en- 
ables Vt to bear the natural interpretation of the conditional standard deviation, the 
notion of volatility. More importantly, moment condition is part of model specifica- 
tion, and it should be prior to and independent of the choice of likelihood. Changing 
the moment condition would not solve the robustness issue of non-Gaussian QMLE; 
it simply renders consistency to the correct combination of moment condition and 
non-Gaussian likelihood, which cannot be determined without knowing the true in- 
novation. 
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Therefore, we prefer a non-Gaussian QMLE method which is robust against error 
misspecification, more efficient than Gaussian QMLE, independent of model assump- 
tions, and yet practical. Such method can also well extend the usage of non-Gaussian 
QMLE in GARCH software packages. Current packages do include choice of likeli- 
hood as an option, for example. Student's t and generalized Gaussian. In addition 
the shape parameter can be specified or estimated. But as discussed before, such 
method is not robust against error misspecification. When running estimation, one 
chooses a particular likelihood family with the hope that true innovation distribution 
falls into such family, but typically it does not. 

The main contribution of this paper is that we propose a novel two step non- 
Gaussian QMLE method, 2SNG-QMLE for short, which meets the desired properties. 
The key is the estimation of a scale adjustment parameter, denoted as rjf, for non- 
Gaussian likelihood to ensure the consistency of non-Gaussian QMLE under any 
error distributions, rjf is estimated through Gaussian QMLE in the first step; then 
we feed the estimated rif into non-Gaussian QMLE in the second step. In Gaussian 
QMLE ?7/ is held constant at unity, and partly because of that this quantity has been 
overlooked; but in non-Gaussian QMLE 77 j is no longer constant, and how much it 
deviates from unity measures how much asymptotic bias would incur by simply using 
non- Gaussian QMLE without such adjustment. 



The s econd contribution is t 
(see also iNewey and Steigerwald 



l at we adopt a re-paraineterize d GARCH model 



Drostetal 



(119971 )) which separates 



(119971 ) and 

the volatility scale parameter from heteroscedastic parameters. Under this new 
parametrization, we derive asymptotic behaviors for 2SNG-QMLE. The results show 
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that 2SNG-QMLE is more efficient than Gaussian-QMLE under various innovation 
settings, and furthermore there is a clear cut on asymptotic behaviors under the new 
parametrization. The heteroscedastic parameters we can always achieve asymp- 
totic normality, whereas Gaussian QMLE has slower convergence rate when error 
does not have fourth moment. 

The outline of the paper is as follows. Section 2 introduces the model and as- 
sumptions. Section 3 discusses the estimation procedure and derives the asymptotic 
results for 2SNG-QMLE. Section 4 proposes two extensions to further improve ef- 
ficiency. Section 5 employs Monte Carlo simulations to verify the theoretic results. 
Section 6 conducts real data analysis on stock returns. Section 7 concludes. The 
appendix provides all the mathematical proofs. 

2 The Model and Assumptions 

The re-parameterized GARCH(p, q) model takes on the following parametric form: 

Xt = crvtSt (3) 
p q 

vl = l + Y^ aixli + J2 h^h (4) 
i=i j=i 

The model parameters are summarized inO = {a, 7'}', where a is the scale parameter 
and 7 = (a', b')' is the autoregression parameter. The true parameter is in the 
interior of 6, which is a compact subset of the R^^^'^'^, satisfying o" > 0, Oj > 0, 
bj > 0. We use subscript to denote the value under the true model throughout the 
paper. The innovation {et}t=i,...,T are i.i.d random variables with mean 0, variance 1 
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and unknown density g{-). In addition, we assume that the GARCH process {xt} is 
strictly stationary and ergodic. The elementary con ditions for the stationarity a nd 
ergodicity of GARCH models have been discussed in iBougerol and PicardI ( 1l992l ). 



We consider a parametric family of quasi likelihood {77 : ^/(-)} indexed by 77 > 0, 
for any given likelihood function /. Unlike a shape parameter that is often included in 
a Student's t likelihood function, 77 is a scale parameter selected to reflect the penalty 
of model misspecification. More precisely, a specific quasi likelihood scaled by 77/ will 
be used in the estimation procedure. The parameter t/j minimizes the discrepancy 
between the true innovation density g and an unsealed misspecified quasi likelihood 
in the sense of KuUback Leibler Information Distance, see e.g. White(1982). Or 
equivalently, 

T]f = argmax^>o^{ - log?7 + log/(-)} (5) 

where the expectation is taken under the true model g. 

Note that rjf here only depends on the divergence of the two densities under 
consideration, rendering it a universal measure of closeness irrelevant of the GARCH 
model. Once t// is given, the QMLE is defined by maximizing the following quasi 
likelihood with this model parameter rjf-. 

MO) = ;^ E ^*(^) = ^ E ( - + /(^)) (6) 

Apparently, the likelihood function differs from a regular one with the additional 
model parameter r/j. In fact, our approach is a generalization of the Gaussian QMLE 



8 



and the MLE as illustrated in the next proposition. 
Proposition 1. If f (x exp(— or f = g, then rjf = 1. 



Moreover, it can be implied from 



Newey and Steigerwaldl ( 119971 ) that in general 



an unsealed non-Gaussian likelihood function applied in this new re-parametrization 
of GARCH(j9, q) setting fails to identify the volatility scale parameter a, resulting in 
inconsistent estimates. We show in the next section that incorporating 77/ into the 
likelihood function facilitates the identification of the volatility scale parameter. 

For convenience, we assume the following regularity conditions are always satis- 
fied: / is twice continuously differentiable, and for any 77 > 0, we have sup^ge E\lt{6)\ < 
00, E sup g^j^ \ Vlt{0)\ < 00, and E supg^j^ |V^/j(0)| < 00, for some neighborhood J\f 

of Oq. 



3 Main Results 
3.1 Identification 

Identification is a critical condition for consistency. It requires that the expected 
quasi likelihood L^iO) = E{Lt{0)) has a unique maximum at the true parameter 
value 60 . To show that 6 can be identified in the presence of 77/, we make the 
following assumptions: 

Assumption 1. A quasi likelihood of the GARCH {p,q) model is selected such that 
1- Vti^o) > 0, and ft (7) /ft (70) is not a constant if ^ 7^ Jo- 
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2. The function Q{r]) = — log?] + £'(log/(|)) has a unique maximizer rif > 0. 

Note that the first point is the usual identification condition for the autoregression 
parameter 70, and that the second requirement is the key to the identification of ctq. 

Lemma 1. Given Assumption^ Lt{0) has a unique maximum at the true value 

e = eo. 

The next lemma provides a few primitive sufficient conditions for the last state- 
ment of Assumption [H The conditions given below provide a general guideline of 
choosing an appropriate quasi likelihood /. 

Lemma 2. Assume that f is continuously differentiable up to the second order and 
h{x) = Xj^. Suppose that {ei} ^ e is i.i.d. with mean 0, variance 1 and a finite 
pth uiQuiQfif^ jj^ addition, 

1. h{x) < 0. 

2. xh{x) < and the equality holds if and only if x = 0. 

3. \h{x)\ < C\x\^, and \xh{x)\ < C\x\^, for some constant C > 0, and p > 0. 

4. limsup/i(a;) < — 1 

x—^oo 

then Q{t]) has a unique maximum at some point rjf > 0. Furthermore, rjf > 1 if and 
only if Eh{e) < — 1. 

The last three assumptions are more general than the concavity of the function 
Q{rj). For some commonly used likelihood such as the family of Student's t likelihood, 
the concavity assumption of Q is violated. However, it still satisfies the above lemma. 
A few examples of families of likelihood that satisfy Lemma [2] are given below. 
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Remark 1. If f = -^^^ ; then h{x) = —x^. If f is the standardized t^- distribution 

with u > 2, that is f oc {1 + "2)"^' then h{x) = —^jz^^- Both cases satisfy 

r(— ) /3 

Lemmal^ with p = 2. In addition, if log f{x) = —{xl'^ {7=^)2 -\- const, the generalized 

r(— ) /3 

Gaussian likelihood, then h{x) = — (3(^^)^1x1^ ■ In this case, by choosing p = [3, 
Lemma [H is satisfied. 

3.2 The Distinction Between Gaussian and Non-Gaussian 
QMLE 

First of all, consider the case in which ///is given, or more directly, the true error dis- 
tribution is known. The following asymptotic analysis reveals the difference between 
the Gaussian QMLE and the non-Gaussian one. 

Theorem 1. Assume that rjf is known. Under Assumptions{^ dr — > do, where 
6t is the quasi likelihood estimator obtained by maximizing 

Next, we discuss the asymptotic normality of the QMLE. As usual, additional 
moment conditions are needed. 

Assumption 2. Let k = (^, ^^')' , CLnd ko be its value at 9 = 6q. 

1. 0< E{h'^{^)) <oo,0< E\eh{^)\ < 00. 

2. M = E{koko) < 00. 

Theorem 2. Under Assumptions\^ and\^ we have 

T^(^T-0o) ^iv(0,Si) (7) 
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where Ei = M'^^, h,ie) = 1 + h{^), and hie) = ^h{^). 

The moment conditions given by the first point of Assumption [2] only depend on 
the tail of the innovation density g and quasi likelihood /. A striking advantage of 
non Gaussian QMLE over its Gaussian alternative is that the former may require 
weaker conditions on the tail of the innovation. It is well known that the asymptotic 
normality of Gaussian likelihood requires a finite fourth moment. By contrast, it 
implies from Remark [1] that any Student's t likelihood with degree of freedom larger 
than 2 has a bounded moment, so that no additional moment conditions are needed 
other than those assumed in any GARCH model. 

Moreover, it turns out that model parameter rjf has another interpretation as a 
bias correction for a simple non-Gaussian QMLE of the scale parameter in that ctq?]/ 
would be reported instead of ctq. Therefore, the unsealed QMLE can consistently 
estimate (Tq if and only if rjf = 1. Proposition [1] hence reveals the distinction in 
consistency between the MLE, Gaussian QMLE and the other alternatives. 

In general, for an arbitrary likelihood, rjf would not equal to 1, thereby creating 
the popularity of the Gaussian QMLE, whose ?7/ is exactly 1. It is therefore necessary 
to incorporate this bias-correction factor rif into non-Gaussian QMLE, which may 
potentially obtain a better efficiency than the Gaussian QMLE. However, as we have 
no prior information concerning the true innovation density, 77/ is unknown. As a 
result, this estimator is infeasible. A promising way to resolve this issue would be to 
estimate rjf in the first step. 
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3.3 Two-Step Estimation Procedure 

In order to estimate rjf, a sample on the true innovation is required. According to 
Proposition [H without knowing rjf, the residuals from the Gaussian QMLE may po- 
tentially provide substitution for the true innovation sample. A two-step estimation 
procedure is proposed in the following. In the first step, fjf is obtained by maximizing 
([5]) with estimated residuals from Gaussian quasi likelihood estimation: 

1 ^ 1 ^ 

r)/ = argmax^- ^ hixt, Ot, v) = argmax^- ( - log(r/) + log /(— )) (8) 



t=i t=i 



where 



Ot = argmaxg- ^ li{xt, 6) = argmax^- ( - ^og{(TVt) - j (9) 

i=l t=l ^ ^* 

and St = Xt/{d'Vt{j)). Next, we maximize non-Gaussian quasi likelihood with plug-in 
57/ and obtain 0t'- 

T , T 



^7fy^h{xuf]f,e) = argmaxg;^ V ( ~log{avt) + \og f{^:^] 



We call 0T the two step non-Gaussian QMLE, 2SNG-QMLE for short. Alter- 
natively, this two-step procedure can be viewed as a one-step generalized meth- 
ods of moments (GMM) procedure, by considering the score functions. Denote 
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s{x,e,r],4)) = e),S2{x,6,r]),S3{x,r],^)y, where 

^ii^t,0) = ^^i(^*'^) = K"^ + ^) ^^^^ 

S2ixue,r]) = ^l,(xt,e,v) = --(l + hi^)) (12) 
or] 7] \ Tjavt / 

S3{xt,V,(p) = ■^h{xt,V,(p) = -k[l + h{ )) (13) 

0(p V ?7(7ft / 

then the estimators are obtained using GMM with identity weighting matrix: 

1 ^ 

{Ot, Vf, = argming^^- ^ s'{xt, 6, r/, (t))s{xt, 6, r], (p) (14) 

t=i 

so our proposed estimator is simply 0t = 4>t- 
3.4 Asymptotic Theory 

Identification for the parameters and rj is straightforward. As in Theorem [H the 
consistency thereby holds: 

Theorem 3. Given Assumption {0T,Vf, 0t) — > (^o, "'?/, ^o), in particular the 
2SNG-QMLE 0t is consistent. 

In order to obtain the asymptotic normality, we realize that a finite fourth mo- 
ment for the innovation is essential in that the first step employs the Gaussian QMLE. 
Although alternative rate efficient estimators may be adopted to avoid moment con- 
ditions required in the first step, we prefer the Gaussian QMLE for its simplicity and 
popularity in practice. 
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Theorem 4. Assume that E{e'^) < oo, that AssumptionsUl andlE are satisfied. Then 
i^TiVfi^T) are jointly normal asymptotically. That is, 



where 



T^f]f-r]f) 

THer-Oo] 



\ 



N 







n s,,, 
s n' 



n 



4 



On 



Ehi{ef 



eiCi 



Eh^ie] 



n 



Eh^ie) 



2 ^ 



E{hi{e 



1)) 



(15) 
(16) 

(17) 

(18) 

eie; (19) 



where ei zs a finzi column vector that has the same length as 6, with the first 
entry one and all the rest zeros. 

Before a thorough efficiency analysis of the non-Gaussian QMLE we ffist 
discuss the asymptotic property of r//. Although f)/ is obtained using fitted residuals 
St in ([8]), the asymptotic variance of r}/ is not necessarily worse than that using 
the actual innovations St. In fact, with true innovation the asymptotic variance of 
the Tjf estimator is 7]'jEhl/{Eh2)'^. Comparing it with f lT7|) . we can find that using 



ffited residual improves the efficiency as long as the \hi/ Eh-} 



1)/2| is smaller 
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than \hi/Eh2\. This occurs when the non-Gaussian hkehhood is close to Gaussian 
hkehhood. One extreme case is choosing the same Gaussian hkehhood in the second 
step. Then rjf exactly equals one and the asymptotic variance of r)j vanishes. 

rjf also reveals the issue of asymptotic bias incurred by using unsealed non- 
Gaussian QMLE. FromlTOl while 2SNG-QMLE Ot = {cft^t) maximizes the log- 
likelihood, unsealed non-Gaussian QMLE would choose estimator {fifax, It) to max- 
imize log-likelihood without 77^ in it. So for the volatility scale parameter a it is biased 
exactly according to the fjf. Such bias will propagate if using the popular original 
parametrization. Recall 

Xt = CTtEt 

P Q 

i=i j=i 

Clearly, we have cr^aj = di, bj = bj and = c. Therefore, potential model misspeci- 
fication would result in systematic biases in the all estimates of and c if unsealed 
non-Gaussian MLE, such as Student's t-likelihood, is applied without introducing 
Vf- 

3.5 Efficiency Gain over Gaussian QMLE 

We compare the efficiency of three estimators of using two step non-Gaussian 
QMLE, one step (infeasible) non-Gaussian QMLE with known rjf, and Gaussian 
QMLE. Their asymptotic variances are S2, Si and Sg respectively. The difference 
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in asymptotic variances between the first two estimators is 



S2 — Si 



lial 



) 



(20) 











where 



^ = 



E{e^ - If 



(21) 
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Effectively, the sign and magnitude of /i summarize the advantage of knowing 77/. /i 
is usually positive when the true error has heavy tails while non-Gaussian QMLE is 
selected to be a heavy-tailed likelihood, illustrating the loss from not knowing rjf. 
However, it could also be negative when the true innovation has thin tail, indicating 
that not knowing ?7/ is actually better when a heavy tail density is selected. Intu- 
itively, this is because the two-step estimator incorporates a more efficient Gaussian 
QMLE into the estimation procedure. More importantly, the asymptotic variance 
of 7 and the covariance between a and 7 are not affected by the estimation oirjf. 
In other words, we achieve the adaptivity property for 7: with an appropriate non- 
Gaussian QMLE, 7 could be estimated without knowing rjf equally well as if 77/ were 
known before. 

We next compare the efficiency between Gaussian QMLE and 2SNG-QMLE. By 
(171) with / replaced by the Gaussian likelihood, we have 
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It follows from Lemma [3] in the appendix that, 



Eg - S2 = /i 



) 



(22) 







V 



where Uq = Vtilo) , Vq = E{yo), V = Vaid/o)'^ and hereby the last matrix in 
(I22I) is positive definite. Therefore as long as /x is positive, non-Gaussian QMLE is 
more efficient for both cr and 7. 

It is well known that the financial data sets such as stock prices and exchange 
rates exhibit heavy tails. Therefore, if a selected likelihood has heavier tails than 
Gaussian density, then /i is positive, and the efficiency of the QMLE is thereby 
improved over Gaussian QMLE. 



Denote the asymptotic variance of the MLE as Sm- By (j7]) with / replaced by the 
true likelihood g, we have 



where hg = ^j^- The gap in asymptotic variance between 2SNG-QMLE and MLE 
is given by 



3.6 Efficiency Gap from the MLE 




( 



Eh^ 



{E{hl-l))-')M-' + ao 



2fE{e'-ir Eh\ 




S2 — Sjvf 



V 4 {Eh2) 
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An extreme case is that the selected hkehhood / happens to be the true inno- 
vation density. Being unaware of it, we still apply a two-step procedure and uses 
the estimated rjf. Therefore, the first term in ( l23l) vanishes, but the second term 
remains. Consequently, 7 reach the efficiency bounds, while the volatility scale a 
fails, reflecting the penalty of ignorance of the true model. This example also sheds 
light on the fact that 6t cannot obtain the efficiency bounds for all parameters unless 
the true underlying density and the selected likeli hood are both Gaussian. This ob - 



servation agrees with the comparison study in the 



Gonzlez-Rivera and Drost 



concerning the MLE and their semiparametric estimator. 



3.7 The Effect of the First Step Estimation 

We would like to further explore the oracle property of the estimator for heteroscedas- 
tic parameters by considering a general first step estimator. We have shown in The- 
orem m that the efficiency of the estimator for 7 is not affected by the first step 
estimation of 77/ using Gaussian QMLE, as if 77/ were known. Therefore, we may re- 
lax the finite fourth moment requirement on the innovation error by applying another 
efficient estimator in the first step. On the other hand, even if the first step estima- 
tor suffers from a lower rate, it may not affect the efficiency of the heteroscedastic 
parameters 7, which is always consistent and asymptotically normal. 

Theorem 5. Suppose that the first step estimator has an influence function rep- 
resentation: 

T 

t=i 
19 



with the right hand side converging to a non- degenerate distribution, and Xt ~ T^/°' 
for some a G [1,2]. Then the convergence rate for a is also TXj^ , while the same 
central limit theorem for 7 as in Theorem remains, that is. 



where ^ = { ^'^Kr^ 



_dv_\ 

) ^7 17=70. 



Theorem [5] applies to sev e ral est imators that have been discussed in the literature. 



For example, iHall and Yaol (j2003l ) have discussed the Gaussian QMLE with ultra 
heavy-tailed innovations that violate a finite fourth moment. In their analysis, Xt 
is regularly varying at infinity with exponent a G [1, 2). The resulting G a ussian 



Drost and KlaassenI (119971 ) 



QMLE suffers lower convergence rates. By contrast 
have suggested an M-estimator based on the score function for logistic distribution 
to avoid moment conditions on the innovations. Both estimators, if applied in the 
first step, would not affect the efficiency of 7^-. 



4 Extensions 

We discuss two ways to further improve the efficiency of 2SNG-QMLE. One is choos- 
ing the non-Gaussian likelihood from a pool of candidate likelihoods to adapt to data, 
the other is an affine combination of 2SNG-QMLE and Gaussian QMLE according 
their covariance matrix in Theorem H] to minimize resulting estimator's asymptotic 
variance. 
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4.1 Optimal Choice of Likelihood 

There are two distinctive edges of choosing a heavy tailed quasi hkehhood over Gaus- 
sian hkehhood. First, the T^-consistency of 2SNG-QMLE of 7 no longer depends on 
finite fourth moment condition, but instead finite Eh\/{Eh2Y. This can be easily 
met by, for example, choosing generalized Gaussian likelihood with /3 < 1. Second, 
even under finite fourth moment, heavy tailed 2SNG-QMLE has lower variance than 
Gaussian QMLE if true innovation is heavy tailed. A pre-specified heavy tailed like- 
lihood can have these two advantages. However, we can adaptively choose this quasi 
likelihood to further improve its efficiency. This is done by minimizing the efficiency 
loss from MLE, which is equivalent by minimizing Eh\/{Eh2)'^ over certain families 
of heavy tailed likelihoods. We propose optimal choice of non-Gaussian likelihoods, 
where candidate likelihoods are from Student's t family with degree of freedom v > 2 
and generalized Gaussian family with /3 < 1. Formally, for true innovation distribu- 
tion g and candidate likelihood /, define 

A{f,g) = where h, = I + h{-), and = -h{-) (23) 

Eg{li2) Vf Vf Vf 

Then the optimal likelihood is chosen from t-family and generalized Gaussian family 
(gg): 

r = argmin,,^ {{A(/*,^)},>2, , ^)};3<i} (24) 

where g denotes the empirical distribution of estimated residuals from Gaussian 
QMLE, the first step. Because this procedure of choosing likelihood is adaptive 
to data, it is expected that the chosen quasi likelihood results in a more efficient 
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2SNG-QMLE than a pre-specified one. We justify this point in simulation studies. 

A 2SNG-QMLE with choosing optimal likelihood runs the following four steps: 
(a) Run Gaussian QMLE and obtain the estimated residuals; (b) Run optimization 
fl24|) and obtain the optimal likelihood /*; (c) Obtain fjf using /* and estimated 
residuals; (d) Run 2SNG-QMLE with /* and f]f. 

4.2 Aggregating 2SNG-QMLE and Gaussian QMLE 

Another way to further improve the efficiency of 2SNG-QMLE is through aggrega- 
tion. Since both Gaussian QMLE and 2SNG-QMLE are consistent, an affine combi- 
nation of the two, with weights chosen according to their joint asymptotic variance, 
yields a consistent estimator and is more efficient than both. Define the aggregation 
estimator 

0^ = we + {i -w)e (25) 

where is a diagonal matrix with weights (u^i, w;2, • • • , u^i+p+g) on the diagonal. 
From Theorem HI the optimal weights are chosen from minimizing the asymptotic 
variance of each component of the aggregation estimator: 

w* = clTgmm^w\l:2)i,i + il-w)\'EG)^,^ + 2w{l-w)Si^i (26) 
(^2)1,1 + (^0)1,1 ~ '2'^i,i 
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It turns out that all optimal aggregation weights w* are the same, which is 

w* = ^ ^ V 2 ■ (27) 

Proposition 2. The aggregated estimator Oj, uses optimal aggregation weights W = 
w*I. Its asymptotic variance has diagonal terms 

= T^^^—J^^ ^ = + (28) 

Although estimators for a and 7 have different asymptotic properties, the op- 
timal aggregation weights are the same: w* = w* . Also the weight depends only 
on non-Gaussian likelihood and innovation distribution, but not on GARCH model 
specification. The aggregated estimator Oj, always have smaller asymptotic variance 
than both 2SNG-QMLE and Gaussian QMLE. If data is heavy tailed, e.g., Ee'^ is 
large or equal to 00, from fj27|) it simply assigns weights approximately 1 for 2SNG- 
QMLE and for Gaussian QMLE. In practice, after running 2SNG-QMLE with 
optimal choice of likelihood, we can estimate the optimal aggregation weight w* by 
plugging into (1271) the estimated residuals. 
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5 Simulation Studies 



5.1 Model Free Characteristics 

The scale tuning parameter rif and the efficiency difference /x are generic characteris- 
tics of non-Gaussian hkehhoods and of the true innovations, and they do not change 
when using another conditional heteroscedastic model. We numerically evaluate how 
they vary according to the non-Gaussian likelihoods and innovations. 



Table 1: T]f for generalized Gaussian QMLEs {gg,Tow) and innovation distributions 
(column) 





990.2 


990.6 


99i 


99i.i 


991.S 


992 


^3 


h 


t7 


tn 


990.2 


1.000 


6.237 


8.901 


10.299 


11.125 


11.416 


8.128 


9.963 


10.483 


10.885 


990.6 


0.271 


1.000 


1.291 


1.434 


1.515 


1.544 


1.159 


1.384 


1.443 


1.487 


991.0 


0.354 


0.844 


1.000 


1.073 


1.114 


1.128 


0.900 


1.040 


1.074 


1.098 


99i.i 


0.537 


0.873 


0.962 


1.000 


1.022 


1.029 


0.883 


0.977 


0.998 


1.012 


991.8 


0.811 


0.952 


0.981 


0.993 


1.000 


1.002 


0.946 


0.985 


0.991 


0.997 



Table 2: rjf for Student's t QMLEs (row) and innovation distributions (column) 





^2.5 


^3 


u 


t5 


tr 


tn 


990.5 


99i 


991.5 


992 


^2.5 


1.000 


1.231 


1.425 


1.506 


1.584 


1.641 


0.900 


1.414 


1.614 


1.716 


^3 


0.815 


1.000 


1.151 


1.216 


1.275 


1.318 


0.756 


1.150 


1.301 


1.375 


u 


0.715 


0.874 


1.000 


1.054 


1.100 


1.133 


0.697 


1.011 


1.122 


1.174 


h 


0.690 


0.836 


0.953 


1.000 


1.043 


1.071 


0.691 


0.966 


1.061 


1.107 


tr 


0.679 


0.816 


0.922 


0.964 


1.000 


1.024 


0.708 


0.945 


1.018 


1.053 


tn 


0.690 


0.823 


0.916 


0.953 


0.980 


1.000 


0.749 


0.941 


0.998 


1.021 


t20 


0.720 


0.845 


0.928 


0.958 


0.981 


0.992 


0.811 


0.954 


0.992 


1.007 


^30 


0.742 


0.862 


0.939 


0.965 


0.981 


0.992 


0.846 


0.966 


0.993 


1.004 



Table [T] and [2] show how rjf varies over generalized Gaussian likelihoods and 
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Student's t likelihoods with different parameters respectively. For each row, which 
amounts to fixing a quasi likelihood, the lighter the tails of innovation errors are, 
the larger rjf. Furthermore r]f > 1 for innovation errors that are lighter than the 
likelihood, and r/j < 1 for innovations that are heavier than the likelihood. Therefore 
if the non-Gaussian likelihood have heavier tails than true innovation, we should 
shrink the data in order for consistent estimation. On the other hand if the quasi 
likelihood is lighter than true innovation, we should magnify the data. 

For each column (fix an innovation distribution), in most cases the heavier the 
tails of likelihoods are, the larger rjf, but the monotone relationship is not true for 
some ultra heavy tail innovations, in which cases 7]f shows a "smile" dynamic. The 
non-monotonicity in the likelihood dimension indicates that to determine 77/ one 
needs more information about the likelihood than just the asymptotic behavior of 
its tails. 

Table E] and m show the dependence of fi on the true innovation (column) and non- 
Gaussian likelihood (row). From the table we see that in most cases fi is positive, 
which means that non-Gaussian QMLE shows an improvement. But when heavy 
tailed likelihoods are applied on true innovations with moderate or thin tails, /i 
turns negative, which means that Gaussian QMLE performs better. 

Looking at each column, by fixing the innovation distribution, non-Gaussian 
QMLE performs the best over Gaussian QMLE when the non-Gaussian likelihood 
coincides with the innovation distribution (MLE). Looking at each row, by fixing a 
non-Gaussian likelihood, its relative performance increases when the true innovation 
distributions become more heavy tailed, even after passing the MLE point where 
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true innovation and likelihood are the same. This is because /i is a relative measure 
of non-Gaussian over Gaussian, not an absolute measure for asymptotic variance. 
When the true innovation is heavier than the non-Gaussian likelihood, non-Gaussian 
QMLE does not perform as well as MLE, but Gaussian QMLE does even worse than 
as if the true innovation coincides with non-Gaussian likelihood. Therefore, even the 
absolute efficiency in terms of asymptotic variance drops for non-Gaussian QMLE, 
its relative performance over Gaussian QMLE actually increases. 

To summarize the variation of /i, one can draw a line for distributions according 
to their asymptotic behavior of tails, in other words, according to how heavy their 
tails are, with thin tails on the left and heavy tails on the right. Then we place 
non-Gaussian likelihood, Gaussian likelihood and true innovation distribution onto 
this line. The sign and value of fi depends on where true innovation distribution is 
placed, (a) It is placed on the right side of non-Gaussian likelihood, then fi is positive 
and large, (b) Error is on the left side of Gaussian, then /i is negative and large in 
absolute value, (c) Error is between non-Gaussian and Gaussian, then, to which 
likelihood is innovation closer determines /i. This seems like a symmetric argument 
for Gaussian and non-Gaussian likelihood. But in financial applications we know 
true innovations are heavy tailed. Even the non-Gaussian likelihood may not be 
the innovation distribution, we still can guarantee either (a) happens or (c) happens 
with innovation closer to non-Gaussian likelihood. In both cases we have /i > and 
non-Gaussian QMLE is a more efficient procedure than Gaussian QMLE. 
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Table 3: fi for generalized Gaussian QMLEs {gg,Tow) and innovation distributions 
(column) 





ggo.2 


ggo.6 


ggi 


ggi.i 


ggi.8 


gg2 


^4.5 


^5 


tr 


tu 


ggo.2 


484 


1.773 


-0.062 


-0.335 


-0.416 


-0.436 


2.411 


0.929 


-0.026 


-0.274 


ggo.6 


482 


1.978 


0.195 


-0.075 


-0.157 


-0.175 




2.608 


1.138 


0.206 


-0.030 


ggi.o 


474 


1.839 


0.250 


0.017 


-0.053 


-0.071 


2.590 


1.149 


0.267 


0.054 


ggiA 


443 


1.424 


0.209 


0.040 


-0.010 


-0.022 


2.369 


1.008 


0.234 


0.068 


ggi.s 


328 


0.589 


0.089 


0.022 


0.003 


-0.002 


1.588 


0.596 


0.114 


0.032 


Table 4: fi 


for Student's t QMLEs 


(row) and innovation distributions (column) 




^4.5 


t5 


tr 


tg 


^15 


^30 




ggo.5 


ggi 


ggi.5 


gg2 


^2.5 


2.534 


1.045 


0.071 


-0.114 


-0.263 


-0.324 




3.848 


0.004 


-0.296 


-0.375 


^3 


2.626 


1.145 


0.189 


0.011 


-0.124 


-0.183 




3.871 


0.124 


-0.158 


-0.223 


U 


2.663 


1.194 


0.258 


0.086 


-0.038 


-0.090 




3.816 


0.191 


-0.067 


-0.124 


h 


2.664 


1.200 


0.277 


0.114 


-0.004 


-0.054 


3.770 


0.211 


-0.031 


-0.084 




2.642 


1.190 


0.287 


0.131 


0.020 


-0.022 


3.667 


0.222 


-0.001 


-0.051 


til 


2.591 


1.150 


0.277 


0.132 


0.035 


-0.004 


3.500 


0.212 


0.016 


-0.025 



5.2 Verification of the Asymptotic Theory 

Now we verify the asymptotic formula ( fT6l) - (fT7j) . We run = 20000 simulations, 
each generating a sample of size T = 7000 from a GARCH(1, 1) model. The model 
parameters are ao = 0.5, aio = 0.35, bio = 0.3. The innovation errors are stan- 
dardized skewed Student's t distribution with degree of freedom uq = 7 and degree 
of skewness Aq = 0.5, so that the left tail is heavier than the right tail. We use 
Student's t likelihood with degree of freedom z/ = 4 in non-Gaussian QMLE. We run 
two-step procedure to obtain the estimates fjf and non-Gaussian QMLE estimates 6. 
Figure [U reports the standardized estimates of (o"o, oio, ^lo, "'?/) compared to A^(0, 1). 
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Standardization is done by first subtracting the estimates by the true value, and then 
dividing by the theoretical asymptotic standard deviation according to Theorem |H 
All plots confirm the validity of asymptotic variance formula (1161) -( |T7I) . 



ARCH 





Figure 1: Histogram of standardized 2SNG-QMLE and standard normal pdf (solid line). 
Normalization is done by first subtracting the estimates by the true value, and then dividing 
by the theoretical asymptotic standard deviation suggested by our theory. 



5.3 Comparison with Gaussian QMLE and MLE 

We compare the efficiency of 2SNG-QMLE, Gaussian QMLE and MLE under various 
innovation error distributions. We don't perform optimal choice of quasi likelihood 
in 2SNG-QMLE, instead fix the quasi likelihood to be Student's t distribution with 
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degree of freedom 4. The simulation is conducted on a GARCH(1, 1) model with true 
parameters (o"o, ai,05 ^1,0) = (0.5,0.35,0.3). For innovation errors we use Student's 
t and generalized Gaussian distributions of various degrees of freedoms to generate 
data. For each type of innovation distribution, we run = 1000 simulations each 
with T = 3000 samples. Tables [5] and [6] reports the relative efficiencies of these 
three estimators in terms of ratios of sample variances and MSEs. The first ratio, 
Gaussian/2SNG, indicates how 2SNG-QMLE outperforms (underperforms) Gaus- 
sian QMLE. The second ratio 2SNG/MLE indicates how far 2SNG-QMLE is from 
efficiency bound. 

In Table |5] the innovation distributions range from thin-tailed ^20 (approximately 
Gaussian) to heavy-tailed ^2.5- Biases are small so standard deviations and RMSEs 
are nearly the same. For the first two thin-tailed cases, ^20 ^-nd ^is? Gaussian QMLE 
outperforms 2SNG-QMLE by a small margin. For all other cases 2SNG-QMLE out- 
performs Gaussian QMLE. In heavy tailed cases and ts, 2SNG-QMLE performs 
nearly as well as MLE, and reduces standard deviations by 15% to 60% from Gaus- 
sian QMLE. In ultra-heavy tail cases (^4, and ^2.5), since fourth moment no longer 
exists, Gaussian QMLE is not T^-consistent, and its estimation precision quickly 
deteriorates, sometimes to an intolerable level. In contrast 2SNQ-QMLE using ^4 
likelihood does not require finite fourth moment for T^-consistent ai^ and 6i^0) so 
standard deviations for ai and 61^0 are still nearly equal to MLE. Standard devia- 
tions of ctq are now larger than MLE, but still significantly smaller than Gaussian 
QMLE. 

In Table [6l the innovations innovations range from thin tailed gg^i to heavy tailed 
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Table 5: Student's t innovations simulation 



Innov. 


Comparing 


Ratio of variances 


Ratio of MSEs 






dist. 


methods 




<yo 


0^1,0 




Ko 






ai,o 




^1,0 


ho 


G./2SNG 


0, 


.929 


0.901 


0, 


.936 


0, 


.929 


0.898 





.936 




2SNG/MLE 


1 


.092 


1.122 


1, 


.089 


1, 


.091 


1.126 


1 


.089 




G./2SNG 


0, 


.942 


0.960 


0, 


.961 


0, 


.939 


0.948 





.960 




2SNG/MLE 


1 


.112 


1.121 


1, 


.087 


1, 


.114 


1.131 


1 


.087 


h 


G./2SNG 


1, 


.115 


1.186 


1, 


.108 


1, 


.118 


1.185 


1 


.109 




2SNG/MLE 


1 


.109 


1.022 


1, 


.020 


1, 


.019 


1.023 


1 


.020 


h 


G./2SNG 


1, 


.216 


1.260 


1, 


.186 


L 


.217 


1.266 


1 


.186 




2SNG/MLE 


1, 


.036 


1.024 


1, 


.031 


1, 


.037 


1.026 


1 


.031 


h 


G./2SNG 


1, 


.355 


1.528 


1, 


.302 


1, 


.355 


1.552 


1 


.303 




2SNG/MLE 




1. 


1.022 




1. 




1. 


1.022 




1. 


h 


G./2SNG 


1, 


.526 


2.495 


1, 


.405 


1, 


.547 


2.530 


1 


.409 




2SNG/MLE 


1, 


.025 


1.001 


1, 


.015 


1, 


.027 


1.001 


1 


.015 


u 


G./2SNG 


2, 


.074 


7.244 


1, 


.847 


2, 


.125 


7.478 


1 


.858 




2SNG/MLE 


1, 


.065 


1. 




1. 


1, 


.071 


1. 




1. 


h 


G./2SNG 


2, 


.687 


31.40 


2, 


.535 


2, 


.850 


33.26 


2 


.580 




2SNG/MLE 


1, 


.235 


1. 




1. 


1, 


.264 


1. 




1. 


^2.5 


G./2SNG 


1, 


.960 


93.91 


2, 


.649 


2, 


.051 


101.5 


2 


.664 




2SNG/MLE 


2, 


.371 


1.037 


1, 


.062 


2, 


.625 


1.037 


1 


.062 



ggoA- For innovation with ggi,2 and heavier, 2SNG-QMLE starts to outperform 
Gaussian QMLE. In all cases, the Student ^4 2SNG-QMLE performs very close to 
MLE as indicated by standard deviations. In comparison, Gaussian QMLE's perfor- 
mance deteriorates as tails grow heavier, particulary in ggo,Q and ggoA, although in 
these cases the fourth moments are finite. 
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Table 6: generalized Gaussian innovations simulation 



Innov. Comparing Ratio of variances Ratio of MSEs 



dist. 


methods 






•^1,0 




^1,0 












^1,0 


994 


G./2SNG 


0.743 


0, 


.742 


0, 


.769 


0, 


.748 


0, 


.736 


0, 


.771 




2SNG/MLE 


1.705 


1, 


.843 


1, 


.571 


1 


.696 


1, 


.886 


1, 


.566 


Gauss. 


G./2SNG 


0.811 


0, 


.717 


0, 


.850 


0, 


.808 


0, 


.706 


0, 


.850 




2SNG/MLE 


1.233 


1, 


.395 


1, 


.176 


1 


.238 


L 


.416 


1, 


.176 


991.2 


G./2SNG 


1.045 


1, 


.007 


1, 


.019 


1, 


.047 


1, 


.006 


L 


.016 




2SNG/MLE 


1.076 


1, 


.113 


1, 


.070 


1, 


.076 


1, 


.117 


1, 


.071 


991 


G./2SNG 


1.091 


1, 


.210 


1, 


.073 


1, 


.090 


1, 


.201 


1, 


.073 




2SNG/MLE 


1.084 


1, 


.120 


1, 


.074 


1 


.086 


1, 


.130 


L 


.074 


990.8 


G./2SNG 


1.258 


1, 


.736 


1, 


.237 


1, 


.239 


L 


.689 


1, 


.229 




2SNG/MLE 


1.082. 


1, 


.022 


1, 


.044 


1 


.096 


1 


.068 


1, 


.048 


990.6 


G./2SNG 


1.653 


2, 


.623 


1, 


.526 


1, 


.663 


2, 


.650 


1, 


.527 




2SNG/MLE 


1.089 


1, 


.135 


1, 


.061 


1 


.100 


1, 


.144 


1, 


.061 


990.4 


G./2SNG 


1.951 


4, 


.619 


1, 


.772 


1, 


.958 


4, 


.760 


1, 


.764 




2SNG/MLE 


1.170 


1, 


.204 


1, 


.095 


1 


.191 


1, 


.210 


1, 


.098 



5.4 Ultra-Heavy Tail Case 

Here we compare the efficiency when innovation are transformations from stable-a 
distributions. Index a ranges from 1.9 down to 1.1, and the distributions are trans- 
formed such that they do not have fourth moments but have 2nd moment to be unity. 
Furthermore they are asymmetric and not unimodal. Since distribution functions are 
not explicit, MLE is difficult to obtain. Table [7] compares the performance between 
2SNG-QMLE with optimally chosen quasi likelihood and Gaussian QMLE. We still 
use the GARCH(1,1) model with true parameters (ctq, ai,o, &i,o) = (0.5, 0.35, 0.3), and 
run = 2000 simulation with T = 3000. The candidate quasi likelihoods are Stu- 
dent's t distributions with DoF from 20 to 2.5, and generalized Gaussian distributions 
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with DoF from 4 to 0.4. 



Table 7: Stable innovations simulation 



Innov. 


Comparing 


Ratio of variances 


Ratio of MSEs 




dist. 


methods 




0-1,0 






ai,o 


^1,0 


a = 1.9 


G./NG-opt 


1.266 


1.446 


1.205 


1.285 


1.470 


1.215 


a = 1.7 


G./NG-opt 


2.502 


5.072 


2.175 


2.551 


5.301 


2.178 


a = 1.5 


G./NG-opt 


5.381 


148.9 


4.004 


5.605 


154.8 


3.954 


a = 1.3 


G./NG-opt 


9.774 


499.1 


6.911 


10.10 


524.5 


6.868 


a = 1.1 


G./NG-opt 


16.08 


1313 


10.19 


16.94 


1445 


9.960 



Gaussian QMLE deteriorates as tails grow heavier (smaller a). In particular 
for ai^o, it produces many large estimates, making substantial biases upward and 
intolerable standard deviation levels. In contrast, 2SNG-QMLE shows little sample 
bias and small standard deviations. It also shows that as innovations grows heavier, 
2SNG-QMLE delivers smaller standard deviations. 

For a = 1.9 case, among 2000 simulations, the algorithm chooses Student's t quasi 
likelihoods for 1977 times, and Gaussian likelihoods 23 times. Among the chosen 
Student's t likelihoods, the degrees of freedom spread out from 5 to 20, and mostly 
concentrate on 6, 7, 9 and 12. For the rest four cases, all chosen quasi likelihoods 
are Student's t likelihoods. In case a = 1.7, the degrees of freedom concentrate on 
3 and 4, with a small fraction of 5. In a = 1.5, around 1650 simulations choose ^2.5; 
the rest choose ^3. In a = 1.3 and a = 1.1, all chosen quasi likelihoods are ^2.5, the 
most heavy tailed candidate. 
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6 Empirical Work 

Work run a simple GARCH(1, 1) model on Citigroup stock daily return from Jan- 
uary 03, 2008 to Jan 15, 2010. There are 514 trading days in the data. We report 
the estimated parameters using old parametrization. The Gaussian QMLE estimates 
for {c,a,b) is (0.6522,0.2205,0.7793). Clearly data shows high degree of persistence 
in that a + b ^ 1. The 2SNG-QMLE chooses ggi.2 as optimal likelihood, and the 
estimates for model parameters and rif are (0.7689,0.2075,0.7728) and 1.0458, re- 
spectively. Since fjf deviates from 1 about 4.6%, there would be a significant bias if 
we run ggi,2 QMLE without scale adjustment. 

On the other hand, even a non-Gaussian QMLE allowing to estimate shape of 
quasi likelihood cannot guarantee consistency. In fact, such method only picks one 
likelihood in some distribution family that is 'least" biased for the data, but bias 
due to misspecification of innovation distribution remains. We perform unsealed 
generalized Gaussian QMLE with shape estimation. The estimated shape is (3 = 
1.305, which is close to the 1.2, the shape of optimal likelihood in 2SNG-QMLE. 
We fix shape 1.305 and run 2SNG-QMLE again, fif is still 1.033. This means even 
allowing to estimate the shape in quasi-likelihood, unsealed non-Gaussian QMLE 
still incurs a 3.3% bias. 

7 Conclusion 

This paper regards on GARCH model estimation when innovation distribution is 
unknown, and it questions the efficiency issue of Gaussian QMLE and consistency 
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issue of currently used non-Gaussian QMLE. It proposed the 2SNG-QMLE to tackle 
both issues. The first step runs a Gaussian QMLE whose purpose is to identify 
the scale tuning parameter, rjf. The second step runs a non-Gaussian QMLE to 
estimate model parameters. The quasi likelihood / used in second step can be a 
pre-specified heavy tailed likelihoods, properly scaled hy rjf . It can also be chosen 
from a pool of candidate distributions in order to adapt different characteristics of 
unknown innovation distribution. 

The asymptotic theory of 2SNG-QMLE does not depend on any symmetric or uni- 
m odal assumptions of innovations . By adopting a different parametrization proposed 



by lNewey and Steigerwaldl (jl997| ). and incorporating ?7j, 2SNG-QMLE improves the 
estimation efficiency from Gaussian QMLE. We and show that the asymptotic behav- 
ior of 2SNG-QMLE can be broken down to two parts. For the heteroscedastic param- 
eters 7, 2SNG-QMLE is always T^-consistent and asymptotically normal, whereas 

consistency of Gaussian QMLE relies on finite fourth moment assumption. When 
Ee^ < oo, 2SNG-QMLE outperforms Gaussian QMLE in term of smaller asymptotic 
variance, provided that innovation distribution is reasonably heavy tailed, which is 
common for financial data. For the scale part a, 2SNG-QMLE is not always T^- 
consistent, but simulation shows that the estimation for a is usually equally well as 
heteroscedastic parameters, 7. We also run simulation to compare the performance 
of Gaussian QMLE, 2SNG-QMLE and MLE. In most cases 2SNG-QMLE shows an 
edge and is close to MLE. 

One possible generalization of 2SNG-QMLE is to linearly combine candidate 
quasi likelihoods in the second step. Instead of choosing a single likelihood, the 
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log-likelihood objective in the second step is a weighted average of candidate log- 
likelihoods. The weights are chosen adaptively to optimize the asymptotic variance. 
By such combination efficiency, it will cover more dimensions of innovation distribu- 
tions, and the efficiency will be further improved. 



A Appendix section 

A.l Proof of Lemma [1] 

Proof. 



o-o^^t(7o) 

< Qijif) - ^ogaoVtijo) + log?7/ 



By Assumption [H the inequality holds with positive probability. Therefore, by iter- 
ated expectations, Lt{6) < Lt{Oo). □ 

A. 2 Proof of Lemma [2] 

Proof. Given regularity conditions, we have Q{r]) = — ^£"(1 + h{^)). Denote H{r]) = 
E{1 + h{^)). Q{r]) = j,H{r]) - ^^H{r,), where ij(r/) = -^E{eh{^)), for any > 0. 
Because E{eh{^)) < 0, so Hirf) > 0. Next, lim^_j.+oo -^^(^7) = 1, since 

£ E\e\P 
hm \H{ri)-l\ = lim \E{h{-))\ < lim ^^^0 
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On the other hand, by Fatou's lemma along with 1 and 4, we have 

lim sup H{r]) = \im sup E{l + h{-)) < 1 + E(limsup < 

then lim^_^+oo -^(^7) = 1, lim sup^_j.Q_|. H{ri) < 0, and H{ri) > 0. Hence, there exists a 
unique constant rjf G (0, oo) such that Hijif) = 0, hence Qijif) = 0, and Qijif) < 0. 
This concludes the proof. □ 



A. 3 Proof of Theorem [T] 



Proof. The proof i s simi lar to 



given m 



Pfanzagll (119691 ) . 



Elie and Jeanthead (jl995[ ) by verifying the conditions 



□ 



A. 4 Proof of Theorem [2] 

Proof. Let pt{0) = {cFVt)~'^k and (Tt{0) = avtl-y). Define the vector- valued function 
ip as 



~~de 



T 



Ed 

t=i 



/(— ) r 



)k 



For convenience, we c onsider the paramet ers ranging within a local neighborhood of 



the true values as in 



Hall and Yad (120031 ). This simplification may not be critical. 



given that the estimator is proved to be consistent. By Taylor expansion. 
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atiOf = atiOof + AtiOone - 6o)+ \\ 9 - 60 f Ru{e)at{eoY (29) 

Pt{e) = ptiOo) + Bt{eo){e - eo)+ we-Oo ||' R2t{o)at{eoy^ (so) 

where Rit{d) and R^tiO) are an r-vector and r x r matrix, and r = 1 + p + g. 
On the other hand, 



h{ 



hi 



£t(Tt{Oo) 



K-) - -h{^)at{eofpt{Oo)\e - 0o) 

Vf Vf Vf 



e-OofRstid) (31) 



where Rst{0) is an r-vec tor. 



It has been shown in 
Rsti^), component-wise, 



Hall and Yaol t(](]± that for Rt{0) = Rit{e), R2t{0) and 



P(T"iV sup \Rtm<C)^l 
£=1 |0-0o|<5 



(32) 



with ^ sufficiently small. Therefore, we can rewrite the equation (l29l) as 



T T 

= V(l + /,(£i))a,(0o)Vt(^o) + E ( - -^(-)^*(^o)Vt(^o)'Pt(^o) 



t=l 



+{i + h{^)){At{eoyptieo) + atieo)'Bti0o)))io-eo)+ we-Oo ^tr^o) 

Vf 
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where 



P( sup \R{e)\ <C) — ^ 1 (33) 
\e-0o\<i 



Note that 



e({1 + h{^)){At{eoypt{eo) + atieo)'BtiOo)) 

EUAtiOoYptiOo) + atieoYBt{eo))EJl + hi^) 
V V r]f 

E(iM0oypt{eo) + M6ofB,ieo)))E(i + h{^: 



Vf 

= (34) 
Therefore, it may be proved from the ergodic theorem that 

T-i V(l + h{^){A,{OoyptiOo) + at{eofB,i6o)) (35) 

T 

T-i V-/i(-)a,(0o)Vt(^o)'pt(0o) ^ MEf^iii^)) (36) 



Hence, we have 



(Me(^/i(^)) + op(l))(0 - 0o)+ II - ^0 f i?(0) 



T 



= T-i5^(l + M^))fco (37) 
where Op(l) does not depend on Q. It may be proved from martingale central limit 
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theorem that 



1 



then it follows from the same argument as in 



Hall and Yaol (|2003|) that 



eT-0o = OpiT-^/^) 



and 



op{l)){6 - Oo) = V(l + /i(-))fco 

^ Vf 



t=i 



Thus, 



T'I^Ot - Oo) ^ n(0, ^ _ i 



E{ ^h(^' 

\ rif V r]f ' 
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A. 5 Proof of Proposition [T] 



-/{- 



Proof. Define the likelihood ratio function G{ri) = E{\og{—jr^)). Suppose G{ri) has 
no local extremal values. And since log(x) < 2(-\/x— 1), 



-/(-) /-/(-) 



fie) 



+ 00 



i/(-)/(x)dx-2 

7] 7] 



< 



< 



+ 00 



-/(-) 
7] 7] 



The equality holds if and only if r/ = 1. Therefore, 77 = 1 is the unique maximum of 

□ 



A. 6 Proof of Theorem |4] 

In order to show the asymptotic normality, we first list some notations and derive 
a lemma. For convenience, we denote t/o = ^^(!yo) ^'^'^ Vo = ^iVo), so ko = 

and ko = Eko = (^,^0)'- Also, let M = E{koko'), N = k^k'^ and 
V = Var(yo)"^- All the expectations above are taken under the true density g. 

Lemma 3. The following claims hold: 

1. The inverse of M in block expression is 



V 



(42) 
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3. M ^NM ^ = M ^NM ^NM ^ = cr^eiei , where ei is a umt column 
vector that has the same length as 0, with the first entry one and all the rest 
zeros. 



Proof. The proof uses Moore- Penrose pseudo inverse described in lBen-Israel and Greville 



(120031 ) ■ Observe that 



M 



^ ^ 



v 



Var(7/o) 



(43) 



Use the technique of Moore-Penrose pseudo inverse, 



M 



-1 



^ ^ 



Var(i/o) 



H 



V 



H 



(44) 



where H is formed by the elements below: 



/3 



w 



1 + yoVyo 

KSo)' 



m 



w 



n 



V 
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H 



-vw 



-mn 



\m\ 



liulPllmll^ 



mw 







So (H2|) is obtained by plugging H into (jHj). The rest two points of the lemma can 
be obtained by simple matrix manipulation. □ 

Next we return to the proof of Theorem |H 



Proof. According to Theorem 3.4 in 



Newey and McFaddenI (Il986l ). {OT,fi,OT) are 



jointly T 2 -consistent and asymptotic normal. The asymptotic variance matrix is 



(45) 



where G = £'(Vs(xt, 6o, rif, Oo)). View this matrix as 3 x 3 blocks, with asymptotic 
variances of {OtjTIjOt) on the first, second and third diagonal blocks. We now 
calculate the second and the third diagonal blocks. The expect Jacobian matrix G 
can be decomposed into 



G = E 



^ VeSi{xt,Oo) ^ 

V^S3(xt,?7/,0o) V0S3(a;t,?7/, ^o) J 
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Denote the corresponding blocks as Gij, i,j = 1,2,3. Direct calculation yields 

Gil = -2M 

Vf ^ Vf Vf^ 

G22 = \E(h{-)-) 
Vf ^ Vf Vf^ 

G32 = G21' 

G33 = Efh{-)-)M 

The second diagonal block depends on the second row of and s{xt,6o,rif,6i 
The second row of is 

So the asymptotic variance of 17 is G22 E{q2q'2)G'22^ , where 
q2 = -G2iGii~^Si{xt,Oo) + S2ixt,6o,r]f) 



e 



The last step uses the second point of Lemma [3l So ( |T7I) is obtained by plugging 
the expressions for G22 and q2- Similarly, the third row of G~^ is 

Gs3 ^(^326*22^ G^ii ^ — G32G22 I) 
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The asymptotic variance for 6 is G33 ^E{q:iq'^)G:i3! ^, where 

= -(1 + h{-)){ko - fco) - ^i?(/i(-)-)fcofc'o(W)-'fco(e' - 1) 
T]f I r]f r]f 

= -hi{ko-ko)-^{Eh2){e' -l)ko 
The last step uses the second point of Lemma [31 Then 



EqsQs' = Ehl{M - N) + -^{Eh2fE{e^ -l)N 



EhjM + [\E{e^ - 1)' - Ehl^N 



Therefore, (fT6|) is obtained by plugging in the expressions for G33, Eq^qs', and apply 
the third point of Lemma [H 

The asymptotic covariance between and r}/ is G^s"^ E(qsq2)G'^^ , then direct 
calculation using the second point of Lemma [3] yields 



The same formula recurs in the asymptotic covariance between and r}/, which is 
G,^-'E{q,q2)G'22\ 

Finally, the asymptotic covariance between 6 and is G-^^^ E(qiq'2)Gs3 ^ , 
denoted as S. If implies from the third point of Lemma |3] that 



2E{h2) 2 V '^Eh2 
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which concludes the proof. 



□ 



A. 7 Proof of Theorem [5] 

Proof. Following the similar idea to GMM, we may prove 



I 








Xj'T 2^-21 


G22 








Gs2 


G33 



TX~\e-eo) 

T^fjf - r]f) 



( 1 v^T 



iEti**(^0 + op(i) 
;^Er=4(i + Mt)) + Mi) 
V ;^Ef=i(i + Mt))fco + op(i) 



Clearly, the corresponding weighting vector for \/T{6t — Oq) is 



G33 ^Gs2G22 ^tT 2 ~Ga,3 ^G^-iGoo G9, 



Note that 



-Gss~'Gs2G^,' = -aoVf{E(h{-)-)r'ei 

thus the sub-matrices corresponding to 7 parameter are Os. Therefore, the first step 
has no effect on the central limit theorem of 7y The result follows from Lemma [3l 
In terms of 0"^, its convergence rate becomes 
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A. 8 Proof of Proposition [2] 

Proof. Denote random variables Kg = (1 — e^)/2, and k,2 = hi{e/'r]f)/E{h2{e/r]f)). 
We show the optimal weights for a and 7 are the same. From Lemma [3l Theorem H] 
and ( l27l) . for a, the numerator in is 

(Sg)i,i - 3i,i = cro(l + yQVyo)EKl - (jIEk% + c7oyoV^o^('«G«2) 

The denominator in is 

(Sg)i,i + (S2)i,i-2Hi,i = (yl{l + y'^Vy^){EKl + EKl) + al{EKl-EKl) 

-2alEKl + 2aly'QVyQE{KGti2) 

Therefore we obtain w\ = E{kg{i^g + f^2)) / E{k,g + i^2Y ■ Now we compute the weights 
corresponding to 7. For « = 2, . . . , 1 + p + g, let j = « — 1, also from flTTj) . 

Therefore all the optimal aggregation weights are the same. □ 
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